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We  consider  three  simple  approaches  to  rounding  error  in  leaBt  squares 
regression.  The  first  treats  the  rounded  data  as  if  they  were  unrounded,  the 
second  adds  an  adjustment  to  the  diagonal  of  the  covariance  matrix  of  the 
variables,  and  the  third  subtracts  an  adjustment  from  the  diagonal.  The 
third,  Sheppard's  corrections,  can  be  motivated  as  maximum  likelihood  with 
small  rounding  error  and  either  (1)  joint  normal  data  or  (2)  normal  residuals, 
"regular"  independent  variables,  and  large  samples.  Although  an  example  and 
theory  suggest  that  the  third  approach  is  usually  preferable  to  the  first  two, 
a  generally  satisfactory  attack  on  rounding  error  in  regression  requires  the 
specification  of  the  full  distribution  of  variables,  and  convenient 
computational  methods  for  this  problem  are  not  currently  available. 
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ROUNDING  ERROR  IN  REGRESSION: 

THE  APPROPIATENESS  OF  SHEPPARD'S  CORRECTIONS 


Arthur  P.  Dempster  and  Donald  B.  Rubin 


1 .  Introduction 

Our  purpose  is  to  clarify  the  problem  of  adjusting  estimated  regression 
coefficients  for  rounding  errors  in  the  data.  First,  we  contrast  three 
methodologies  which  have  been  suggested.  Two  of  these  lead  to  simple  but 
different  adjustments.  The  remaining  methodology  uses  likelihood  analysis  and 
leads  to  adjustments  which  depend  on  the  choice  of  a  prior  (marginal) 
distribution  for  the  design  matrix.  Second,  we  derive  some  details  of 
likelihood  analysis  for  the  limiting  case  of  small  rounding  error.  He  use  our 
results  to  point  out  two  circumstances  under  which  likelihood  analysis  leads 
approximately  to  adjustment  via  Sheppard's  (1898)  corrections. 

Adjustments  for  rounding  error  can  be  surprisingly  large,  especially  when 
compared  to  the  sampling  standard  deviation  of  estimated  regression 
coefficients.  In  fact,  although  the  standard  deviation  is  generally 
proportional  to  n-1//2  as  sample  size  n  increases,  the  rounding  error 
adjustment  does  not  decrease  as  n  increases.  Furthermore,  the  size  of  the 
adjustment  is  substantially  increased  when  the  design  matrix  is  ill- 
conditioned,  so  that  well-known  numerical  accuracy  problems  associated  with 
ill-conditioned  design  matrices  are  complemented  by  less  well-known,  but  often 
practically  important,  rounding  error  problems. 
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An  artificial  numerical  example  serves  to  illustrate  potential 
differences  among  adjustment  techniques.  We  construct  a  4-variate  normal 


distribution  of  (Y,  X1#  X2,  X3)  by  specifying  zero  means  and  covariance 
matrix 


This  covariance  matrix  comes  from  allowing  (Yj,  X^,  X2,  X3)  to  depend  on 
NID(0,1)  variables  ( Z z2,  Z3,  Z4 ) ,  as  follows:  Y  “  Zy, 

X1  -  p  Z1  +  /l^p2  z2,  x2  -  P  Z1  +  /l-P2  Z3,  and 

X3  «  p  Z1  +  / (1-p2)/2  (z2  +  Z^).  To  obtain  a  reasonably  ill-conditioned 

design  matrix  we  set  P  -  .9.  Five  different  regression  fits  of  the  form 

bg  +  b1  X1  +  b2  X2  +  b3  X3  are  summarized  in  Table  1,  along  with  the 

2 

associated  multiple  R  .  Estimated  standard  deviations  are  shown  in 
parentheses  for  two  of  the  fits. 

The  first  column  in  Table  1  shows  the  fit  obtained  with  the  actual 
covariance  matrix,  corresponding  to  an  infinite  sample  or  equivalently  to  the 
true  model,  so  that  sampling  error  is  zero.  The  second  column  of  Table  1  is 
based  on  a  random  sample  of  size  n  »  10,000  from  the  4-variate  normal  as 
defined  above  with  P  -  .9.  The  sampling  standard  deviations  of  the 
estimated  b^,  b2,  b3  are  calculated  in  the  usual  least  squares  way.  The 
differences  between  the  first  two  columns  are  comfortably  within  20  limits 
for  error. 
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The  remaining  three  columns  of  Table  1  are  based  on  analyses  of  the  same 


sample  of  10,000  as  the  second  column,  except  that  the  data  were  rounded 
before  analysis.  Specifically,  Y  was  rounded  to  the  form  □  •□□□,  x1 
was  rounded  to  □  *□  □  ,  x2  was  rounded  to  □  *  □,  and  X3  was  rounded 
to  □.  If  we  perform  least  squares  fitting  on  the  rounded  data,  we  obtain  the 
results  shown  in  column  3  of  Table  1,  where  the  standard  deviations  are 
computed  from  the  usual  formula  ignoring  rounding.  Comparing  columns  1,  2, 
and  3  shows  that  rounding  can  lead  to  quite  degraded  accuracy  of  estimation, 
and  that  nominal  sampling  standard  deviations  can  be  misleading  indicators  of 
typical  error. 

Columns  4  and  5  of  Table  1  assess  the  results  of  two  simple  adjustment 
strategies)  one  gives  reasonable  results,  whereas  the  other  is  worse  than  no 
adjustment  at  all.  Column  4,  labelled  Sheppard,  is  obtained  by  subtracting  a 
term  of  the  form  6/12  from  the  diagonal  elements  of  sample  covariance 
matrix  calculated  using  the  rounded  data,  where  6  denotes  the  width  of  the 
rounding  interval,  i.e.,  .001  for  Y,  .01  for  X.,,  .1  for  X2,  and  1.0 
for  X^.  Column  5,  headed  BRB,  was  obtained  in  exactly  the  same  way  as  column 
4,  except  that  the  appropriate  6/12  was  added  to  each  diagonal  element  of 
the  sample  covariance  matrix.  The  letters  BRB  refer  to  Beaton,  Rubin,  and 
Barone  (1976)  who  present  an  analysis  of  rounding  error  in  regression  which 
could  lead  the  unwary  to  an  adjustment  similar  to  that  shown  in  column  5.  He 
discuss  the  BRB  analysis  in  Section  2.  Our  theoretical  results  imply  that 
nonnormal  but  regular  distributions  for  (Xj,  X2,  X3)  would  produce  similar 
outcomes,  where  regular  is  defined  in  Section  4,  but  that  the  outcomes  would 
be  different  for  a  nonregular  distribution  of  (x^,  X2,  x3),  uniform  for 
example. 
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Table  1 


True 

Unrounded 

Uncorrected 

Sheppard 

BRB 

Model 

Sample 

Rounded 

Corrected 

Corrected 

.2705 

.2791 

.4450 

.2987 

.5260 

(.0098) 

(.0087) 

.2705 

.2610 

.1634 

.2534 

.1250 

(.0098) 

(.0079) 

.4618 

.4536 

.3738 

.4502 

.3176 

(.0055) 

(.0052) 

.9500 

.9495 

.9393 

.9488 

.9329 

Five  eeta  of  regression  coeffients  end  associated  multiple 
correlations  and  standard  deviations* 
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Probability  models  for  rounding  errors  must  be  Interpreted  with  great 

care  if  they  are  to  lead  to  sound  adjustments  for  rounding  error.  In  support 

of  this  proposition  we  review  two  theoretical  arguments  which  lead  to 

estimates  such  as  those  shown  in  columns  3  and  5  of  Table  1.  We  then 

introduce  likelihood  analysis,  and  draw  attention  to  an  essential  difference 

from  the  other  two  arguments:  likelihood  analysis  uses  the  conditional 

distribution  of  the  unobserved  unrounded  values  given  the  rounded  data, 

whereas  the  other  arguments  use  only  the  marginal  distribution  of  the 

difference  between  rounded  and  unrounded  values. 

Consider  data  generated  by  the  familiar  linear  model 

y  -  13Q  +  X  31  +  E  .  (1) 

The  n  *  1  response  vector  Y  is  a  linear  combination  of  k  predictor 

variables,  where  X  denotes  the  n  *  k  design  matrix  giving  the  values  of 

the  k  predictors  for  the  n  observations,  1  is  the  n  *  1  vectors  of 

ones,  and  0  =  (fl  )  is  the  (k+1)  x  1  vector  of  linear  regression 

.1 

coefficients.  The  residual  variation  denoted  by  the  n  x  i  random  vector 

2 

E  is  assumed  to  consist  of  independent  N(0,a  )  components.  Normality  is 
not  required  for  the  first  two  arguments  we  present,  but  complete  model 
specification  is  needed  for  likelihood  analysis. 

In  principle,  Y  and  X  are  directly  observable  whereas  9  and  E  are 
unknown,  but  in  practice  we  observe  only  rounded  values  Y*  and  X* 
differing  from  Y  and  X  by  rounding  error  which  we  denote  by  e  and  d, 
i ,  e  • , 

(Y*,X*)  -  (Y,X)  +  (e,d)  .  (2) 


ttiawirv.  »  .  '«V  I'!, witaKg 
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After  observing  (Y*,X*)  we  can  say  with  certainty  only  that  the  true  (Y,X) 
lies  in  a  rectangular  region  centered  at  (Y*,X*) ,  or  equivalently  that 
(e,d)  lies  in  a  congruent  rectangle  translated  to  the  origin. 

Theoretical  analysis  requires  hypotheses  about  the  distribution  of 
(e,d).  Initially  we  assume  that  the  rounding  error  may  be  regarded  as 
uniformly  distributed  over  the  rectangle.  In  Section  3 ,  we  introduce  specific 
notation  for  the  probability  density  of  (e,d) . 

Combining  equations  (1)  and  (2)  yields 

Y*  -  10fl  +  X*  B1  +  (E-dfl^e)  ,  (3) 

which  has  the  same  form  as  (1)  except  the  E  is  replaced  by  E-df^+e.  Least 

squares  with  model  (1)  can  be  motivated  when  the  n  components  of  E  are 

uncorrelated,  with  zero  means  and  constant  variance!  consequently,  least 

squares  with  model  (3)  can  be  motivated  if  a  similar  condition  plausibly  holds 

for  the  n  components  of  E-dfJ  +e,  because  then  a  least  squares  analysis 

«  •**  1  • 

based  on  (Y*,X*)  should  produce  unbiased  estimates  of  0  with  Gauss-Markoff 
optimality.  Durbin  (1954)  records  the  part  of  this  argument  depending  on  zero 
means  of  E-di^+e  to  conclude  that  the  uncorrected  least  squares  estimate  is 
unbiased.  This  argument  leads  to  no  adjustment  for  rounding  and  thus  to  the 
estimate  in  column  3  of  Table  1.  Cochran  (1968)  provides  further  discussion. 

Our  second  theoretical  argument  is  an  extension  of  the  Beaton,  Rubin,  and 
Barone  (1976)  study  of  rounding  in  regression.  BRB  use  computer  simulation  to 
recreate  the  unknown  (Y,X)  from  the  observed  (Y*,X*),  and  then  compute  a 
least  squares  estimate  of  0  from  the  simulated  (Y,X).  Since  a  single 
choice  of  (Y,X)  may  not  be  typical,  BRB  repeat  the  simulation  many  times, 
drawing  (e,d)  each  time  from  a  uniform  distribution  over  the  rounding 
rectangle  and  computing  a  least  squares  t  for  each  (Y,X). 


The  main  point 


of  BRB  is  that  the  observed  variation  among  these  recreated  least  squares 


estimates  is  useful,  in  the  numerical  analysis  sense,  for  exhibiting  the  range 
of  the  possible  disturbances  due  to  rounding. 

BRB  illustrate  the  technique  on  the  much  analyzed  Longley  (1967)  data. 

A 

They  average  the  simulated  6  vectors  for  the  Longley  data  in  order  to  show  a 

A 

substantial  systematic  difference  between  the  simulated  3  and  uncorrected 
least  squares  applied  to  the  rounded  data. 

A 

The  BRB  average  3  over  a  long  sequence  is  approximately 

ave [ (X*+d)T  (X*+d)]_1  [(X*+d)T  <Y*+e)]  .  (4) 

d,e 

As  BRB  show,  for  small  rounding  intervals,  the  use  of  (4)  is  effectively  the 

2 

same  as  adding  the  appropriate  3/12  terms  to  the  diagonals  of  the  sample 
covariance  matrix  of  (Y*,X*),  as  illustrated  in  column  5  of  Table  1.  With 
our  artificial  data,  we  have  an  advantage  over  BRB  with  the  Longley  data, 
because  we  know  the  true  3  and  so  can  see  directly  that  (4)  appears  to  be 
defective  as  an  adjusted  estimator. 

We  believe  that  the  Durbin-Cochran  and  BRB  approaches  fail  in  our  example 
because  the  reasoning  is  insufficiently  conditional.  An  important  element  in 
the  justification  of  least  squares  for  the  case  of  unrounded  data  from  model 
(1)  is  that,  whatever  may  be  the  real  world  processes  producing  X  and  E, 
the  two  parts  must  be  unrelated  in  the  sense  that  knowledge  of  X  does  not 
provide  any  information  about  E.  The  parallel  requirement  fails  in  the  case 
of  model  (3),  because  the  process  of  determining  X  influences  both  X*  and 
E-dtf^+e  jointly,  i.e.,  X  determines  both  X*  and  d,  so  that  E-dS^e 
no  longer  has  its  initial  approximately  uniform  distribution  conditional  on 
the  observed  X*. 
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While  the  Durbin-Cochran  argument  implicitly  assumes  that  the  a  priori 

distribution  of  E-dfl . +e  remains  valid  given  X* ,  the  BRB  argument  goes  a 

step  further  and  implicitly  assumes  the  initial  distribution  of  -df^+e  holds 

given  both  Y*  and  X*.  We  say  this  because  the  BRB  device  is  to  draw  from 

the  initial  uniform  distribution  of  d  and  e  after  Y*  and  X*  are  fixed. 

The  implicit  assumption  fails  because  observation  of  Y*  and  X*  can  convey 

a  substantial  amount  of  information  about  d  and  e,  especially  when  large 

correlations  exist  among  the  variables. 

The  underlying  idea  of  likelihood  analysis  is  to  consider  the  sampling 

density  of  (Y,  X,  Y#,  X*)  given  8  and  o2 .  Holding  (Y*,  X*)  fixed  at 

their  observed  values  in  this  density  leads  to  a  function  of  (8,  a2,  Y,  X). 

2 

To  obtain  a  likelihood  function  of  the  parameter  (0,  a  ),  it  is  necessary  to 

integrate  out  the  random  variable  (Y,  X)  given  the  fixed  (8,  2 ,  Y*,  X* ) , 

which  is  equivalent  to  integrating  d  and  e  over  their  conditional 

2 

distribution  given  Y*,  X*,  8  and  a  . 

Likelihood  analysis  of  rounding  error  was  introduced  by  Fisher  (1922)  and 
elaborated  by  Lindley  (1950).  Fisher  and  Lindley  show  that  likelihood 
analysis  justifies  the  use  of  Sheppard's  corrections  when  sampling  normal 
populations  with  small  rounding  error.  In  Section  3  we  extend  the  Fisher- 
Lindley  analysis  to  more  general  regression  models  and  show  that  non  Gaussian 
assumptions  about  the  distribution  of  X  can  also  lead  to  likelihood 
justification  for  Sheppard's  corrections  in  large  samples. 

Derivation  of  Sheppard's  corrections  from  sampling  theory  may  be  found  in 
Bisenhart  (1947),  Haitovsky  (1973),  Kendall  and  Stuart  (1962),  and  Wold 
(1934).  Since  our  experience  with  simple  numerical  examples  like  that  shown 
in  Table  1  has  led  us  to  mistrust  the  use  of  standard  sampling  theory  to 
justify  adjustment  for  rounding,  we  do  not  review  literature  on  sampling  bases 
for  Sheppard's  corrections. 
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3.  Likelihood  Analysis  for  Small  Rounding  Error 

The  analysis  here  uses  the  standard  linear  model  ( 1 )  with  independent 
2 

N  ( 0 ,  o  )  error  components  E.  As  noted  in  Section  2,  likelihood  analysis 

requires  that  we  average  over  the  conditional  distribution  of  {V,  X)  given 
2 

(3,  a  ,  Y* ,  X*).  It  follows  that  distributional  assumptions  about  X  are 
required  to  carry  out  the  analysis  and  that  the  resulting  adjustments  may 
depend  on  the  distribution  of  X. 

In  the  absence  of  rounding  error,  least  squares  estimates  of  3  can  be 

justified  as  maximum  likelihood  estimates  based  on  model  (1),  independent  of 

any  assumed  sampling  model  for  X  whose  parameter  3  does  not  depend  on 
2 

3  or  o  .  Our  problem  is  to  find  corrected  maximum  likelihood  estimates  when 
rounding  is  present.  We  restrict  details  to  first  order  corrections  holding 
in  the  limit  when  rounding  error  is  small.  Our  theoretical  analysis  is  thus 
directed  at  finding  small  adjustments  to  the  least  squares  estimates  such  that 
the  adjusted  estimates  are  first  order  approximations  to  maximum  likelihood 
estimates. 

We  suppose  that  the  rows  of  X  are  independently  distributed  according 
to  a  specified  model  depending  on  parameter  B.  Denoting  the  rows  of  (Y,  X) 
by  (Y^,  xA)  for  i  “  1,2, ...,n,  we  suppose  that  X^  has  density  g^'IB). 
Hence,  if  we  could  observe  the  unrounded  (Y,  X)  the  log  likelihood  function 
would  be 

L<3,  a2,  B)  -  -  -L-  l  (Y  -3q-X  B^2  +  l  log  g  (X  |B)  (5) 

20*  i“1  '  "  "  i=*1 

which  we  call  the  complete-data  log  likelihood,  we  assume  throughout  our 
discussion  that  the  rounded  data  (Y*,  X*)  are  fixed  at  their  observed 
values,  and  hence  the  unadjusted  estimates  3*,  o*,  and  6*  are  also  fixed 
and  known,  where  (3*,  0*,  8*)  is  obtained  by  maximizing  (5)  after 
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substituting  (V*,  X*)  for  (Y,  X).  Note  that  0*  and  o*  are  found  by 
maximizing  the  first  term  of  (3)  whereas  0*  is  found  by  maximizing  the 
second  term. 

Our  mathematical  discussion  is  heuristic  in  the  sense  that  we  do  not 
carry  out  the  detailed  analysis  required  to  justify  our  mathematical  argument 
in  terms  of  precise  regularity  conditions  on  the  functions  g  ( •  |  • )  in  a 
neighborhood  of  (X*.  6*).  Our  mathematical  device  for  obtaining  adjustments 
is  as  follows.  The  EM  algorithm  of  Dempster,  Laird,  and  Rubin  (1977)  applies 
to  the  computation  of  maximum  likelihood  estimates  when  data  are  incomplete, 
as  in  our  case  when  (Y*,  X*)  is  observed  but  (Y,  X)  is  not.  The  EM 
technique  is  iterative,  but  the  rate  of  convergence  depends  on  the  fraction  of 
missing  information.  When  the  rounding  error  i3  vanishingly  small,  the  EM 
technique  converges  in  one  iteration  (to  the  desired  first  order  of 
approximation),  starting  from  initial  estimates  (0*,  o*,  0*). 

The  required  iteration  of  the  EM  method  has  two  steps.  First,  in  the  E- 
step,  we  average  the  complete-data  log-likelihood  (5)  over  the  unknown 
(Y,  X)  given  the  observed  rounded  (Y*,  X*)  and  the  current  estimates 
(0*,  o*,  0*).  second,  in  the  M-step,  we  maximize  the  resulting  function  of 
(0,  a,  0).  since  we  are  concerned  with  adjusting  0*,  we  need  to  carry  out 
the  E-step  only  for  the  first  term  in  (5)  which  depends  only  on  the  familiar 
sufficient  statistics  consisting  of  the  sums,  sums  of  squares  and  products 

V  T 

l  (1,  Y  ,  X  )  (1,  Y  ,  X  ).  Having  found  the  appropriate  adjustments  to 

i=1  i-i 

these  sufficient  statistics,  the  M-step  by  definition  simply  computes  the 
estimates  in  the  usual  (i.e.,  least  squares)  way  from  the  adjusted 
statistics.  In  particular,  if  we  can  show  the  first  order  corrections  to  the 


quadratic  statistics  are  Sheppard's  corrections,  then  we  have  shown  that  least 


squares  applied  to  Sheppard-corrected  basic  statistics  gives  the  desired  first 
order  corrected  maximum  likelihood  estimates. 

To  simplify  notation,  the  required  details  of  the  E-step  are  presented 
here  for  the  sums,  sums  of  squares  and  products  of  Z  =  (X,  Y).  He  let 
f ^ ( •  I ♦)  be  the  density  of  where  $  =  (8,  o,  0). 

By  expanding  f^(z^|$*)  about  Z*  we  obtain  the  first  term  Taylor 


series  approximation 


rv 

E  (Z  ,  4>*)  =  f*  +  J  (Z  -  z*  )f* 
i  -i  I  •  i  .L.  1  ij  ij '  i 


where  Z  —  and  Z*^  denote  the  elements  in  z^  and  Z*,  f*  denotes 
fi(Z*U>*)  and 

f3f(Ziir)1 


L  J  J  2  =Z* 

i  i 

We  suppose  that  Z*_,  is  obtained  from  Z^^  by  rounding  to  the  center  of 
an  interval  of  width  6^,  for  j  =  1,2,...,k+1  and  i  =  1,2,...,n.  The  E- 
step  requires  averaging  over  the  conditional  distribution  of  given  that 

Z±  lies  in  the  rounding  rectangle  centered  at  z*.  Dividing  the  marginal 
density  (6)  by  its  integral  over  the  rounding  rectangle  means,  to  the  desired 

k 

first  order  accuracy,  dividing  (6)  by  f*  II  6  ,  Denoting  by  E  the 

j-1  3 

operation  of  averaging  with  respect  to  the  appropriately  scaled  density1,  we 


T  i-  — 2 —  iwi  x+i  K+i 

E.  g.,  E  (z  -Z*  )  =  f  .  •••  J  t .  .  [f  *  +  l  t..f*  ]  n  dt.  ,/f*  it  6. 

3  3  .  j,  _  fk+i  ^  1  jii  v  ^  j=i  v  1  j-i  ^ 


where  t..  =  (Z..-Z*  ). 
ij  il  il 


1 


wweiwmu  > 


(8) 
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*  <z  .  z*  >  :  -J.  lii 

ij  ij  *  12  £i 


(9) 


E  <z±j  -  Z^HZ^  -  zU)  :  0 


(10) 


for  all  i,  j,  t  with  j  ft.  Prom  (8),  (9)  and  (10)  we  obtain 

62  f* 

B(Z  )  *  z*  +  — 1  — 1 ii 
1  ij'  ij  12  f* 


(ID 


and 


6*  f* 

E  (Z2  )  ”  Z*2  +  —1  1 1  +  22*  —ill 
'  ij'  ij  12  "  "ij  f* 


*2  „  .2 
„  6i 

“Ij^Ii  T  12  ‘it  f*  T  12  *ij  f*" 


6*  f*  6^!  f* 

E  (Z  2  . )  m  Z*  z*  4-  —1  Z*  —il  +  — _  7t  it 
'  ij  if'  h“<i  n  ®4t  m  v>  ‘u  m  » 


(12) 


(13) 


for  all  1,  j,  1  with  j  *  A.  Note  that  the  ratio  fj^/f*  is  the  partial 
derivative  of  log  f^Z^  with  respect  to  at  Z^  »  z*^. 

The  E-step  is  completed  by  summing  (11),  (12),  and  (13)  over  i  - 
1,2, ...,n,  whence  the  required  adjustments  to  the  sufficient  statistics 

-  1  z*  ,  -  I  Z*  and  -  l  Z*  Z*  ,  are  respectively, 
i-1  13  n  i-1  13  n  i-1  13  U 


!ii  I  !L 

12  n  L  •*  ' 


i-1  fi 


4 

12 


and 


i  +  2  I  z*  III 

n  i-1  13  fi 


tl 

12 


t  n 

i  J, 


!h 

n 


12 


1  y  z*  _ii 

n  i-i  13  fI 


»  j  f  * 


(14) 


(15) 


(16) 
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4.  Special  Cases  leading  to  the  use  of  Sheppard’s  Corrections 

There  are  two  particular  cases  where  the  likelihood  analysis  of  Section  3 
for  small  rounding  error  leads  to  Sheppard's  corrections  for  regression 
coefficients:  (1)  when  the  rows  of  X  are  a  normal  sample  and  (2)  when  the 
rows  of  X  are  a  "regular”  sample  and  n  tends  to  infinity.  The  second  case 
is  more  fundamental  because  the  entire  likelihood  analysis  leading  to  the  use 
of  maximum  likelihood  estimates  is  predicated  on  large  samples. 

When  X  is  normal,  Z  is  normal  with  mean  say  U  and  variance  1.  We 
start  the  EM  algorithm  at  the  usual  moment  estimators  based  on  rounded  data, 
say  Z*  and  s*.  Then  at  U  *  Z*  and  E  *  S* 


1  _! 

2  23z 


ij 


(Zi  -  Z*)S* 


-1 


(5i 


s 


the 


3th  component 


of 


-  S  (Z* 
«•  « 1 


Z*) 


(17) 


We  are  now  ready  to  calculate  the  E-step,  that  is,  to  calculate  the 

adjustments  to  the  sufficient  statistics.  From  (14)  we  see  that  the 

adjustment  to  —  £  z  is  zero  because  £  (Z*  -  Z*)  *  0.  From  (15)  the 
n  .  XT  i  * x  •  • 


adjustment  to  the  quadratic  sufficient  statistic  ^  J  is  -6^/12,  and 


n  “  ij 


from  (16)  the  adjustment  for  the  quadratic  sufficient  statistic,  l  Z^  Z^, 


is  zero:  these  follows  because 


l  z*(f*t/f* . f*K/f*)  -  l  zjl-s*"1  (5i  “  !*)]  "  ~nl 


(18) 


Consequently,  when  (Y,X)  is  jointly  normal  and  the  rounding  errors  are 
vanishingly  small,  maximum  likelihood  estimates  of  regression  parameters  of 


Y  and  X  are  obtained  by  applying  Sheppard's  corrections  to  the  covariance 


2 

matrix  of  (Y,X):  simply  subtract  3./12  from  the  corresponding  diagonal 


element  of  the  covariance  matrix  where  6 ^  is  the  width  of  the  rounding 
rectangle.  The  likelihood  justification  for  the  use  of  Sheppard's  correction 
with  small  rounding  error  and  univariate  normal  data  appears  in  Fisher  (1922) 
and  Lindley( 1950 ) . 

The  second  case  for  Sheppard's  corrections  treats  large  samples  from 

1  n 

regular  X.  As  n  ♦  ®,  the  summations  —  J  in  (14),  (15)  and  (16)  can  be 

n  i“1 

replaced  by  expectations  over  the  distribution  of  Z*»  doing  so,  we  obtain 
simplified  first  order  corrections  appropriate  in  large  samples. 

Specifically,  the 


correction  to  “  I  2^  is 
i 


l4E*fTV“  U°9  f<z*  !♦*>!> 

12  3  Xij  *  i 


1  c  2 

the  correction  to  —  I  Z  is 


n  l  ij 


6 


— i  [1  +  2  E*{Z*  -z— 
12  1  1  ij  3  Z 


lj 


log  f(Z* !♦*)}] 
•  X  • 


1  r 

and  the  correction  to  —  2,  j  /  l,  is 

n  .  ij 


(19) 


(20) 


%  3  a 

■7^  B*{z*!  -r— -  log  f(Z*|f*)>  +  —  E*{z*  j— —  log  f  (Z* |  ♦* )  >  (21) 

ij  "i  ’  12  ’ij  *  Zil  "i  ’ 

where  4*  is  the  maximum  likelihood  estimate  of  ♦  assuming  the  rounded  data 

were  unrounded,  and  E*  is  the  expectation  over  the  distribution  of  Z* .  As 

all  ^  ♦  0,  an  expectation  over  the  distribution  of  rounded  data,  z*,  will 

equal  the  corresponding  expectation  over  the  distribution  of  unrounded  data, 

Zi#  plus  terms  of  order  6^  and  higher.  Since  each  expectation  in  (19)  - 

2  2 

(21)  is  multiplied  by  a  factor  of  <5^,  to  order  6^,  expectations  E*  over 
the  distribution  of  Z*  may  be  replaced  by  expectations  E  over  the 


distribution  of  Z^.  Then,  if  f(Z  |  $)  is  sufficiently  smooth  to  allow  us 
to  interchange  the  order  of  integration  and  differentiation  in  these 
expectations,  expressions  (19)  -  (21)  exactly  equal  their  values  under 
normality,  as  we  now  show. 

3 

Consider  first  the  expectation  in  expression  (19),  Evg^-  Hog  f(z)]J, 

j 

where  for  notational  convenience  we  suppress  the  irrevelant  subscript  i  and 
replace  f(z|^)  by  f(Z).  For  all  u, 

/ f(Z-u)  dZ  *  1  . 


Thus, 


/ f(Z-u)  dZ  -  0  . 


3 

Letting  D3  refer  to  the  partial  derivative  with  respect  to  the  jtn  argument, 
and  passing  the  derivative  through  the  integral  gives 

/  t-  D3  f(Z-u)]dZ  “  0  , 

and  letting  u  *  0  implies 

J  f(Z)  dZ  »  0  , 


or 


Thus, 


/  (log  f(Z>)  f(Z)  dz  -  0  . 


(log  f(Z)]}  ■  0 

Next  consider  the  expectation  in  expression  (20), 


For  all  u. 


Hence 


s{Zj  3j-  [log  ((.)]}  . 


/  Z^  f(Z-u)  dz  »  u^  +  /  Z^  f(Z)  dZ 


J  Z^  f(Z-u)  dZ  -  1  . 


Passing  the  derivative  through  the  integral  gives 

/  Z.(-D3  f(Z-u)]dZ  -  1 
j  -  ~ 


(22) 
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Lotting  u  “  0  gives 


e{z^  ■jj-  [log  f(z)]>  ■  -1  . 

Finally,  in  order  to  evaluate  the  expectation  in  (21),  from  (22) 

/  Z.  f(Z-u)  dZ  ■  0,  l  +  j  . 


Passing  the  derivative  through  the  integral  and  letting  u 


B(Zj  -jj-  [log  f(Z))}  -  0,  t  /  j  . 


0  gives 


The  regularity  condition  ran  fail.  For  example,  with  uniformly 

distributed  X,  the  correction  to  the  variance  is  equal  to  Sheppard's 

correction,  but  opposite  in  sign,  a  fact  pointed  out  by  Blderton  (1938). 

Moreover,  when  the  second  derivative  of  the  density  of  X  is  large  in 

absolute  value,  as  with  short,  abrupt-tailed  distributions,  the  value  of  the 

appropriate  maximum  likelihood  correction  with  finite  n  can  be  quite  far 

from  its  large  sample  limit  because  t— —  log  f (X. |6)  has  large  variance. 

*  X^  -x  - 


Long  tailed  distributions  for  X^  like  the  Cauchy  do  not  offer  any  problems 
with  respect  to  the  variance  of  this  partial  derivative,  but  do  require 
larger  n  for  the  likelihood  of  8  to  be  sufficiently  concentrated  about 
0*  to  justify  the  approximations  used  here. 
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5.  Conclusions 


With  small  enough  rounding  errors  and  a  large  enough  sample,  our  analysis 
and  example  suggest  that  Sheppard's  corrections  applied  to  the  cross  products 
matrix  of  independent  variables  will  generate  appropriate  corrections  to  the 
regression  coefficients  in  normal  linear  regression  analyses.  With  moderate 
rounding  errors  or  moderate  sample  size,  however,  it  appears  that  a  serious 
attack  on  the  problem  must  confront  the  fact  that  valid  inferences  for  the 
regression  coefficients  will  vary  with  the  specification  of  the  distributional 
form  of  the  independent  variables.  Further  research  will  be  needed  before  the 
limits  on  the  practical  usefulness  of  Sheppard's  corrections  can  be  stated. 
Experience  with  various  plausible  choices  of  distributions  for  the  independent 
variables  will  require  development  of  feasible  computational  tools. 
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